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Abstract 

A set of N independent Gaussian linear time invariant systems is observed by M 
sensors whose task is to provide the best possible steady-state causal minimum mean 
square estimate of the state of the systems, in addition to minimizing a steady-state 
measurement cost. The sensors can switch between systems instantaneously, and there 
are additional resource constraints, for example on the number of sensors which can 
observe a given system simultaneously. We first derive a tractable relaxation of the 
problem, which provides a bound on the achievable performance. This bound can be 
computed by solving a convex program involving linear matrix inequalities. Exploiting 
the additional structure of the sites evolving independently, we can decompose this 
program into coupled smaller dimensional problems. In the scalar case with identical 
sensors, we give an analytical expression of an index policy proposed in a more general 
context by Whittle. In the general case, we develop open-loop periodic switching 
policies whose performance matches the bound arbitrarily closely. 



1 Introduction 

Advances in sensor networks and the development of unmanned vehicle systems for intel- 
ligence, reconnaissance and surveillance missions require the development of data fusion 
schemes that can handle measurements originating from a large number of sensors observing 
a large number of targets, see e.g. [1, 2]. These problems have a long history [3], and can 
be used to formulate static sensor scheduling problems as well as trajectory optimization 
problems for mobile sensors [4, 5]. 
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dahlehOmit . edu . E. Feron is with the School of Aerospace Engineering, Georgia Tech, Atlanta, GA 
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In this paper, we consider M mobile sensors tracking the state of sites or targets in 
continuous time. We assume that the sites can be described by plants with independent 
linear time invariant dynamics, 



Xi = AiXi + BiUi + Wi, Xi(0) = Xi,o, i 



1,... 



N. 



We assume that the plant controls Ui{t) are deterministic and known for t > 0. Each driving 
noise Wi{t) is a stationary white Gaussian noise process with zero mean and known power 
spectral density matrix Wi. 



The initial conditions are random variables with known mean Xi^ and covariance matrices 
Sj^o- By independent systems we mean that the noise processes of the different plants are 
independent, as well as the initial conditions Xi^. Moreover the initial conditions are assumed 
independent of the noise processes. We shall assume in addition that 

Assumption 1. The matrices Sj^o ore positive definite for all i & {1, ... , N}. 

This can be achieved by adding an arbitrarily small multiple of the identity matrix to a 
potentially non invertible matrix Sj_o- This assumption is needed in our discussion to be 
able to use the information filter later on and to use a technical theorem on the convergence 
of the solutions of a periodic Riccati equation in section 4.3.2. 

We assume that we have at our disposal M sensors to observe the N plants. If sensor j is 
used to observe plant i, we obtain measurements 



Here Vij is a stationary white Gaussian noise process with power spectral density matrix Vij, 
assumed positive definite. Also, Vij is independent of the other measurement noises, process 
noises, and initial states. Finally, to guarantee convergence of the filters later on, we assume 
throughout that 

Assumption 2. For alii G {1, . . . , N}, there exists a set of indices ji, j2, . . . ,jni G {1, • • • , M} 
such that the pair {Ai, Ci) is detectable, where Ci = [Cfj_^, . . . , Cfj^ ]'^ . 

- 1/2 

Assumption 3. For all i E {1, . . . , N}, the pair {Ai, W^^ ) is controllable. 
Let us define 



We assume that each sensor can observe at most one system at each instant, hence we have 
the constraint 



Coviwiit),w,it')) = W,6it-t'). 





1 if plant i is observed at time t by sensor j 
otherwise. 



N 





M. 




i=l 
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If instead sensor j is required to be always operated, constraint (1) should simply be changed 
to 

N 

Y,^,j{t) = l. (2) 

1=1 

The equality constraint is useful in scenarios involving sensors mounted on unmanned vehicles 
for example, where it might not be possible to withdraw a vehicle from operation during the 
mission. The performance will be worse in general than with an inequality constraint once 
we introduce operation costs. 

We also add the following constraint, similar to the one used by Athans [6]. We suppose 
that each system can be observed by at most one sensor at each instant, so we have 

M 

$^7r,,(t)<l, Vt, 2 = l,...,iV. (3) 

i=i 

Similarly if system i must always be observed by some sensor, constraint (3) can be changed 
to an equality constraint 

M 

Y,7r,j{t) = l. (4) 

i=i 

Note that a sensor in our discussion can correspond to a combination of several physical 
sensors, and so the constraints above can capture seemingly more general problems where 
we allow for example more that one simultaneous measurements per system. Using (4) we 
could also impose a constraint on the total number of allowed observations at each time. 
Indeed, consider a constraint of the form 



N M 



y y TCij{t) < p, for some positive integer p. 

i=i j=i 

This constraint means that M — p sensors are required to be idle at each time. So we can 
create M — p "dummy" systems (we should choose simple scalar stable systems to minimize 
computations), and associate the constraint (4) to each of them. Then we simply do not 
include the covariance matrix of these systems in the objective function (5) below. 

We consider an infinite-horizon average cost problem. The parameters of the model are 
assumed known. We wish to design an observation policy Ti{t) = {vrjj(t)} satisfying the 
constraints (1), (3), or their equality versions, and an estimator ct^^ of x, depending at each 
instant only on the past and current observations produced by the observation policy, such 
that the average error covariance is minimized, in addition to some observation costs. The 
policy 71 itself can also only depend on the past observations. More precisely, we wish to 
minimize, subject to the constraints (1), (3), 



Javg = minlimsup 



■ T / M 

/ y ] j i^i ~ ^n,i) Ti{Xi — Xtt^i) + ^ ] K^j TCij{t) | dt 
Jo ,-1 \ „-i 



(5) 
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where the constants cost paid per unit of time when plant i is observed by sensor 

j. The Tj's are positive semidefinite weighting matrices. 

Literature Review and Contributions of this paper. The sensor scheduhng problem presented 
above, except for minor variations, is an infinite horizon version of the problem considered by 
Athans in [6]. See also Meier et al. [3] for the discrete-time version. Athans considered the 
observation of only one plant. We include here several plants to show how their independent 
evolution property can be leveraged in the computations, using the dual decomposition 
method from optimization. 

Discrete-time versions of this sensor selection problem have received a significant amount 
of attention, see e.g. [7, 8, 9, 4, 10, 11, 12]. All algorithms proposed so far, except for the 
optimal greedy policy of [11] in the completely sjTiimetric case, either run in exponential time 
or consist of heuristics with no performance guarantee. We do not consider the discrete-time 
problem in this paper. Finite-horizon continuous-time versions of the problem, besides the 
presentation of Athans [6], have also been the subject of several papers [13, 14, 15, 16]. The 
solutions proposed, usually based on optimal control techniques, also involve computational 
procedures that scale poorly with the dimension of the problem. 

Somewhat surprisingly however, and with the exception of [17], it seems that the infinite- 
horizon continuous time version of the Kalman filter scheduling problem has not been consid- 
ered previously. Mourikis and Roumeliotis [17] consider initially also a discrete time version 
of the problem for a particular robotic application. However, their discrete model originates 
from the sampling at high rate of a continuous time system. To cope with the difficulty 
of determining a sensor schedule, they assume instead a model where each sensor can in- 
dependently process each of the available measurements at a constant frequency, and seek 
the optimal measurement frequencies. In fact, they obtain these frequencies by introducing 
heuristically a continuous time Riccati equation, and show that the frequencies can then be 
computed by solving a semidefinite program. In contrast, we consider the more standard 
schedule-based version of the problem in continuous time, which is a priori more constrain- 
ing. We show that essentially the same convex program provides in fact a lower bound on 
the cost achievable by any measurement policy. In addition, we provide additional insight 
into the decomposition of the computations of this program, which can be useful in the 
framework of [17] as well. 

The rest of the chapter is organized as follows. Section 2 briefiy recalls that for a fixed policy 
7r(t), the optimal estimator is obtained by a type of Kalman-Bucy filter. The properties of 
the Kalman filter (independence of the error covariance matrix with respect to measurement 
values) imply that the remaining problem of finding the optimal scheduling policy vr is a 
deterministic control problem. In section 3 we treat a simplified scalar version of the problem 
with identical sensors as a special case of the classical "Restless Bandit Problem" (RBP) 
[18], and provide analytical expressions for an index policy and for the elements necessary 
to compute efficiently a lower bound on performance, both of which were proposed in the 
general setting of the RBP by Whittle. Then, for the multidimensional case treated in full 
generality in section 4, we show that the lower bound on performance can be computed as 
a convex program involving linear matrix inequalities. This lower bound can be approached 
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arbitrarily closely by a family of new periodically switching policies described in section 4.3. 
Approaching the bound with these policies is limited only by the frequency with which the 
sensors can actually switch between the systems. In general, our solution has much more 
attractive computational properties than the solutions proposed so far for the finite-horizon 
problem. 



2 Optimal Estimator 



For a given observation policy 7r{t) = {7^ij{t)}ij, the minimum variance filter is given by the 
Kalman-Bucy filter [19], see [6]. The state estimates Xjr, where the subscript indicates the 
dependency on the policy tt, are all updated in parallel following the stochastic differential 
equation 



d 
dt 



M 



The resulting estimator is unbiased and the error covariance matrix Tj^^^ilt) for site i verifies 
the matrix Riccati differential equation 



d 

dt 



M 



With this result, we can reformulate the optimization of the observation policy as a deter- 
ministic optimal control problem. Rewriting 



E{{xi - Xi)'Ti{xi - Xi)) = Tt {Ti 



the problem is to compute 



min lim sup — 



pT N / M 

/ 5^ Tr(T,S.,,(t)) + J] 

^=l \ .7=1 



i^ij T^ijit) dt 



(7) 



subject to the constraints (1), (3), or their equality versions, and the dynamics (6). 



3 Sites with One-Dimensional Dynamics and Identical 
Sensors 

We assume in this section that 
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1. the sites or targets have one- dimensional dynamics, i.e., G M, i = 1, . . . , A^; and, 

2. all the sensors are identical, i.e., Cij = Ci, Vij = Vi, Hij = Kj, j = 1, . . . , M. 

Because of condition 2, we can simplify the problem formulation introduced above so that 
it corresponds exactly to a special case of the Restless Bandit Problem [18]. We define 



1 if plant i is observed at time t by a sensor 
otherwise. 



Since we assumed that a system can be observed by at most one sensor, the scheduhng 
problem is interesting only in the case M < N. Note that a constraint (4) for some system i 
can be eliminated, by removing one available sensor, which is always measuring the system 
i. Constraints (2) and (3) can then be replaced by the single constraint 



N 



J2Mt) = M, yt. 



i=l 

This constraint means that at each period, exactly M of the sites are observed. We treat 
this case in this section, but again the equality sign can be changed to an inequality with 
very little change in our discussion. 

To obtain a lower bound on the achievable performance, we relax the constraint to enforce 
it only on average 



1 f 

limsup — / 2. T^i{t)dt = M. 

T^oo T Jq 



Then we adjoin this constraint using a (scalar) Lagrange multiplier A to form the Lagrangian 

1 ^ 

L(7r, A) = limsup - / V [Tr (T^ S,,,(t)) + (k, + A) 7ii{t)] dt - AM. 

T^oo -L Jo ~l 

Here Kj is the cost per time unit for observing site i. The dynamics of S^ri are now given by 

^S.,(t) = A,S^,(t) + S^,(t)A: + Wi- Ti,{t) ^^,it)CfVr'Q^^,it), (9) 



S.,i(0) 



-'1,0- 



Then the original optimization problem (7) with the relaxed constraint (8) can be expressed 
as 

7 = inf sup L(7r, A) = sup inf L^n, A), 

where the exchange of the supremum and the infimum can be justified using a minimax theo- 
rem for constrained dynamic programming [20] . We are then led to consider the computation 
of the dual function 

1 ^ 

7(A) = min limsup - / V [Tr (T,S^.i(t)) + + A) n,{t)] dt - AM, 



which has the important property of being separable by site, i.e., 7(A) + AM = ^j=i7*(A), 
where for each site i we have 

7^(A) = minhmsup [ Tr (T^ + (/t, + A) 7ii{t)dt. (10) 

TTi I Jq 

When the dynamics of the sites are one dimensional, i.e., Sj G M, we can solve this optimal 
control problem for each site analytically, that is, we obtain an analytical expression of the 
dual function, which provides a lower bound on the cost for each A. The computations are 
presented in paragraph (3.2). First, we explain how these computations will also provide the 
elements necessary to design a scheduling policy. 

3.1 Restless Bandits 

The Restless Bandit Problem (RBP) was introduced by Whittle in [18] as a generalization of 
the classical Multi- Armed Bandit Problem (MABP), which was first solved by Gittins [21]. 
In the RBP, we have N projects evolving independently, M of which can be activated at each 
time. Projects that are active can evolve according to different dynamics than project that 
remain passive. In our problem, the projects correspond to the systems and their activation 
corresponds to taking a measurement. We describe in our particular context the index policy 
proposed by Whittle for the RBP, which, although suboptimal in general, generalizes the 
optimal policy of Gittins' in the case of the MABP. 

Consider the objective (10) for system i. Clearly, the Lagrange multiplier A can be interpreted 
as a tax penalizing measurements of the system. As A increases, the passive action (i.e., not 
measuring) should become more attractive. For a given value of A, let us denote V^{X) the 
set of covariance matrices S* for which the passive action is optimal. Let S" be the set of 
symmetric positive semidefinite matrices. Then we say that 

Definition 4. System i is indexable if and only ifV'^{X) is monotonically increasing (in the 
sense of set inclusion) from to as A increases from —00 to +00. If system i is indexable, 
we define its Whittle index by A^(S,) = inf{A G M : G V'{\)}. 

However natural the indexability requirement might appear. Whittle provided an example 
of an RBP where it is not verified. We will see in the next paragraph however that for our 
particular problem, at least in the scalar case, indexability of the systems is guaranteed. 
The idea behind the definition of the Whittle index consists in defining an intrisic "value" 
for the measurement of system i, taking into account both the immediate and future gains. 
If the covariance of system i is S*, the Whittle index defines this value as the measurement 
tax (potentially negative) that should be required to make the controller indifferent between 
measuring and not measuring the system. Finally, if all the systems are indexable, the 
Whittle policy chooses at each instant to measure the M systems with highest Whittle index. 
There is significant experimental data and some theoretical evidence indicating that when 
the Whittle policy is well-defined for an RBP, its performance is often very close to optimal, 
see e.g. [22, 23, 24]. 
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3.2 Solution of the Scalar Optimal Control Problem 



We can now consider problem (10) for a single site, dropping the index i. We return to the 
scalar case S G M. The dynamical evolution of the variance obeys the equation 

S = 2AS + - vr— S^, with 7r(t) G {0, 1}. 
The Hamilton- Jacobi-Bellman (HJB) equation is 

7(A) = min <^ T S + (2AS + W)h'{^; A),TS + k + A + (2AS + W- — S2)/i'(S; A) 

'(11) 

where h is the relative value function. We will use the following notation. Consider the 
algebraic Riccati equation (ARE) 

2Ax + W - —x^ = 0. 

First, if T = 0, it is clearly optimal to always observe if A + k < and always observe 
otherwise. Hence the Whittle index is A(S) = —n for all S G IR+, and 7(A) = minjfi; + A, 0}. 
So we can now assume T > 0. If C = 0, the solution to (11) is to always observe if (k + A) < 
and never observe otherwise. Hence the Whittle index is again A(S) = —k for all S G IR+ 
and we get, by letting S = — 1^ in the HJB equation for a stable system: 



— T 1^ + K + A if the system is stable (A < 0) and (/t + A) < 0, 



2A 

for C=0: 7(A) = {-T^ii the system is stable and (ft + A) > 0, 
-00 otherwise {A> 0). 

The third case is clear from the fact that the system is unstable and cannot be measured. 
So we can now assume that C 7^ 0. Then the ARE has two roots 

_A- ^A^ + cm IV _A + ^/WTcWJv 
~ cyv ' ~ cyv ' 

By assumption 3, W and so Xi is strictly negative and X2 is strictly positive. 

We can treat the case k + A < immediately. Then it is obviously optimal to always observe, 
and we get, letting E = X2 in the HJB equation: 

7(A) =Tx2 + K + X. 

So from now on we can assume A > — k. Let us temporarily assume the following result on 
the form of the optimal policy. The validity of this assumption can be verified a posteriori 
from the formulas obtained below, using the fact that the dynamic programming equation 
provides a sufficient condition for optimality of a solution. 

Form of the optimal policy. The optimal policy is a threshold policy, i.e., it observes the 
system for S > J^th and does not observe for S < J^th, for some T^th G 
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We would like to obtain the value of the average cost 7(A) and of the threshold Sj/,(A). 
Note that we already know Sfft,(A) = for A < — k, and we have P(A) = [0, Et/i(A)] for the 
passive region 'P(A) of definition 4. Then the system is indexable if and only if Sj/j(A) is an 
increasing function of A, and then inverting the relation A ^ St/j(A) gives the Whittle index 
S ^ A(S). 



3.2.1 Case St/, < X2 

In this case, we obtain as before 



7(A) =Tx2 + fi: + A. (12) 



This is intuitively clear for "Eth < X2'- even when observing the system all the time, the 
variance still converges in finite time to a neighborhood of X2- Since this neighborhood is in 
the active region by hypothesis, after potentially a transition period (if the variance started 
at a value smaller than T^th), we should always observe, and so the infinite-horizon average 
cost is the same as for the policy that always observes. 

By continuity at the interface between the active and passive regions, we have 

T + {2AEth + W)h'{J:th) = T Si/, + (k + A) + (2ASi/, + W- ^S,\)/i'(Si;,) 



We have then 



i.e., K + X = -y^thh'i^th) 



^S?,(Tx2 + K + X) = ^T E?, + (2AE,, + W)iK + A) 

+ 2AS,, + wyK + x) = ^j:It {x2 - s,,) 

- {T.th - X2){T.th - Xi){k + A) = S^^T (x2 - T^th) (since C 7^ 0) 
T - llthiK + A) + xi(/s: + A) = 



TT? ^ + a/(^)(^ -4si) 

so A(S,,) = -«:+-^, S,,(A) = ^ ^ V (13) 

^th -xi 2 

Expressions (12) and (13) are valid under the condition St/i(A) < X2. Note from (13) that 
St/i H- >• A(Ei/i) is an increasing function and the functions A(-) and St/i(-) are inverse of each 
other. 



3.2.2 Case Y^th > X2 

It turns out that in this case we must distinguish between stable and unstable systems. For 
a stable system {A < 0), the Lyapunov equation 

2Ax + W = 
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has a strictly positive solution Xe = — with Xg > X2 since C ^ 0. 
Stable System (A < 0) with Eth > Xe- 

In this case we know that x^. is in the passive region. Hence, with S = in the HJB 
equation, we get 

7(A) = Txe. (14) 

Then we have again k + A = ^Til^h' iY^th) j and now TY^th + (SAE^/i + W)h'{Yth) = Txe- 
Hence 

^S,\T(a;e - Yth) = {2AYth + W){k + X) = 2A(St;, - Xe){K + A) 



A(S,.) = -«:+^^, Y,,{\)='L^^. (15) 

Stable System {A < 0) with X2 < Yth < Xe, or non-stable system {A> 0): 

If the system is marginally stable or unstable, we cannot define Xe- We can think of this 
case as x^ ^ oc as A ^ 0_, and treat it simultaneously with the case where the system is 
stable and X2 < Yth < Xe- Then X2 is in the passive region, and Xe is in the active region, so 
the prefactors of h'{x) in the HJB equation do not vanish. There is no immediate relation 
providing the value of 7(A). We can use the smooth-fit principle to handle this case and 
obtain the expression of the Whittle indices, following [18]. Again the formal justification 
comes from using the final expressions of the value function thus obtained to verify that it 
indeed satisfies the HJB equation. 

Theorem 1 ([18], [25]). Consider a continuous-time one- dimensional restless bandit project 
x{t) G M satisfying 

x{t) = ak{x), k = 0,1, 

with passive and active cost rates rk{x), k = 0,1. Assume that aQ{x) does not vanish in 
the optimal passive region, and ai{x) does not vanish in the optimal active region. Then the 
Whittle index is given by 

(^^r- M - ao{x)][ao{x)r[{x) - ai{x)r'o{x)] 

A\X) — lo[X) ll[X)-\- /\//\ 

ao[x)ai[x) — ai[x)aQ[x) 

Remark 5. The assumption that Oq and ai do not vanish in the optimal passive and active 
regions respectively excludes the cases previously studied. It is missing from [18], [25], which 
therefore provide only an incomplete description of the Whittle indices for one- dimensional 
continuous-time deterministic projects. 

Proof. The derivation of the expression of the Whittle index can be found in [18], [25, p. 53], 
and is valid only under the additional assumption mentioned above. □ 
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Corollary 6. The Whittle index for the case X2 < "^th < is given by: 



Proof. For X2 < T^th < Xe, the assumptions of theorem 1 are verified with 

ao(S) = 2AS + Vr, ai(S) = + VT- — 
ro(S)=TE, ri(S) = rS + fi:. 

The result follows by a straightforward calculation. Note the expression for A(S) indeed 
makes sense since we can verify that it defines an increasing function of S. 

□ 

With the value of the Whittle index, we can finish the computation of the lower bound 7(A) 
for the case X2 < T^th < Xe- Inverting the relation (16), we obtain, for a given value of A, the 
boundary St/i(A) between the passive and active regions. ^^/^(A) verifies the depressed cubic 
equation 

X'-'^^AX-'^^W^,. (17) 

For A + K > 0, by Descartes' rule of signs, this polynomial has exactly one positive root, 
which is Sf^(A). 

The HJB equation then reduces to 

7(A) =TS + /i'(S)(2AS + W^), forS<St;,(A) (18) 

(-12 

7(A) =rS + K + A + /i'(S)(2AS + Vr- — S^), for S > Si;, (A). (19) 

Now for X2 < Si/j(A) < Xe, letting x = T,th{^) > in the HJB equation, assuming continuity 
of h' at the boundary of the passive and active regions and eliminating h'(T,th{^)), we get 

7(A) = rSi,(A) + + A + (7 - TSi,(A)) (l (^^hW? 



V 2ASi^(A) + W 

7(A)^rE.,(A) + K(^i±f|^f(A)±iI), fo...<E,4A)<... 



3.2.3 Summary 

We collect the previous computations in the following theorem. 
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Theorem 2. In the one- dimensional Kalman filter scheduling problem with identical sensors, 
the systems are indexable. For system i, the Whittle index Aj(Sj) is given as follows: 



• Case C = or T = 0: Aj(Sj) = — Kj, for all Sj G IR+. 

• Case C j^O andT 0: 

wi/i i/ie convention Xe,i = +oo > 0. The lower bound on the achievable performance is 
obtained by maximizing the concave function 

N 

7(A) = J]f(A)-AM (20) 

i=l 

over X, where the term 7*(A) is given by 

• Case T = : 7^(A) = min{A + Ki, 0}. 

• Case T^O,C = 0: = |g + min{A + k^, 0} if Ai < 0, Y{X) = +oo if Ai > 0. 

• Case C ^0 andT ^ 0; 

{Ti X2,i + Ki + X if X < Xi{x2,i), 
^^.(^^^ V-(.,+A)(gUg(A)+H^) ,/A,(x2,)<A<A,(Xe,), 
TiXe,i if Xi{Xe,i) < X. 

where in the second case S*(A) is the unique positive root of (17). 

Proof. The indexabihty comes from the fact that the indices A(S) are verified to be mono- 
tonically increasing functions of E. Inverting the relation we obtain Et/i(A) as the variance 
for which we are indifferent between the active and passive actions. As we increase A, T,th{X) 
increases and the passive region (the interval [0, Sj/,(A)]) increases. □ 



4 Multidimensional Systems 

Generalizing the computations of the previous section to multidimensional systems requires 
solving the corresponding optimal control problem in higher dimensions, for which it is not 
clear that a closed form solution exist. Moreover we have considered in section 3 a particular 
case of the sensor scheduling problem where all sensors are identical. We now return to 
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the general multidimensional problem and sensors with possibly distinct characteristics, as 
described in the introduction. 



For the infinite-horizon average cost problem, we show that computing the value of a lower 
bound similar to the one presented in section 3 reduces to a convex optimization problem 
involving, at worst. Linear Matrix Inequalities (LMI) whose size grows polynomially with 
the problem essential parameters. Moreover, one can further decompose the computation of 
this convex program into N coupled subproblems as in the standard restless bandit case. 



4.1 Performance Bound 

For convenience, let us repeat the deterministic optimal control problem under consideration: 

(21) 



(22) 



min lim sup — / < Tr(Ti Si(t)) + KijiTijit) )■ dt, 
T^oo T Jo [ ^ J 

s.t. ti{t) = Ai^i + J^iAj + Wi-Ei ^^^j{t)CJJV^'Ci^ Si, 2 = 1 . . . , iV, 

7Ci,{t) e{0,l}, \/t>0,t = l...,N, J = 1,...,M, 

N 

5^vr,,(t)<l, Vt>0, j = l,...,M, (23) 

i=l 
M 

$^7r,,(t)<l, Vt>0, 1 = 1,. ..,N, (24) 

S,(0) = S,,o, i = l,...,N. 

Here we consider the constraints (1) and (3), but any combination of inequality and equality 
constraints from (l)-(4) can be used without change in the argument for the derivation of 
the performance bound. We define the following quantities: 

^i,(T) = ^^ 7iij{t)dt,yT>0. (25) 

Since Hijit) G {0,1} we must have < vrjj(T) < 1. Our first goal, inspired by the idea 
already exploited in the restless bandit problem, is to obtain a lower bound on the cost of 
the finite-horizon optimal control problem in terms of the numbers vrjj(T) instead of the 
functions iTijit). 

It will be easier to work with the information matrices 

Qiit) = s-^(t). 

Note that invertibility of Sj(t) is guaranteed by our assumptions, as a consequence of [26, 
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theorem 21.1]. Hence we replace the dynamics (22) by the equivalent 

M 

= -Q,A, - AjQ, - Q,W,Q, + J] T^,,{t)ClVr^^C,,, z = 1, . . . , iV. (26) 

i=i 

Let us also define, for all T, 

S.(T) := ^ j\,{t)dt, Q,{T) := ^ £ Q^{t)dt. 
By linearity of the trace operator, we can rewrite the objective function 

limsup V <^ Tr{Ti S,(T)) + V K,,n,,{T) 

Let denote the set of symmetric, symmetric positive semidefinite and symmetric 

positive definite matrices respectively. A function / : §" is called matrix convex if 

and only if for all x, y G and a E [0, 1], we have 

/(ax + (1 - a)y) ^ af{x) + (1 - «)/(?/), 

where ^ refers to the usual partial order on i.e., A ^ B if and only if B — A E S". 
Equivalently, / is matrix convex if the scalar function x i— > f{x)z is convex for all vectors 
z. The following lemma will be useful 

Lemma 7. The functions 

W 

X ^ x-^ X ^ xwx 

for W eEi^, are matrix convex. 

Proof See [27, p.76, p.llO]. □ 

A consequence of this lemma is that Jensen's inequality is valid for these functions. We use 
it first as follows 

VT, j\,{t)dt^ ^lj\^{t)dt = Q,{T), 

hence 

VT, S,(T) ^ (g,(T))-i. 

and so 

Tr(r,Si(T)) >Tr(r,(4(r))-i). 
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Next, integrating (26) and letting Qi^ = S^J, we have 
(OAT) - O..) = -dJT)A. - A^OAT) - - I 

T 



1 ~ ~ 1 r / 1 r 

J^{Q^{T) - Q,o) = -Q^{T)A, - AfQ,{T) - - / Q,{t)W,Q,{t)dt + J2[f 

1 1 

f{Q^{T) - Q,o) = - AfQ,{T) - - / Q,{t)W,Q,{t)dt + Y,^,,{T)CjjV,j'a 



Using Jensen's inequality and lemma 7 again, we have 

Q^{t)WMt) h Qi{T)WiQi{T), 



1 '•^ 



T 

and so we obtain 







1 

J;{Q^{T) - g,o) ^ -Q^{T)A, - AfQ,{T) - Q,{T)WMT) + ^.i(T)C;\/,TiQ,. (27) 



T 



Last, since Qi{T) y 0, this implies, for all T, 

g,(r)A, + AjQ,iT) + Q,iT)WMT) - J2 ^r^C^V^'Q, ^ (28) 



So we see that for a fixed policy vr and any time T, the quantity 

N r M 



J2 \ Tr(T, S,(T)) + 5^ «:,,^,,(r) }> (29) 
i=i I j=i 



is lower bounded by the quantity 



N r M \ 

i=i I j=i ) 



where the matrices QiiT) and the number T^ij{T) are subject to the constraints (28) as well 
as 

N M 

0<^.,(T)<1, ^fr,,(T)<l, j = l,...,M, 5^7f,,(T)<l, 2 = l,...,iV. 

i=i j=i 

Hence for any T, the quantity Z*{T) defined below is a lower bound on the value of (29) for 
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any choice of policy tt 



Z* (T) = mill ^ <^ Tr (T, Q"') + E ^^^P^^ ' (30) 

^ O 
s.t. Q,A, + AjQ, + QmQ^ - J2p^jCl,V,j'a, ^ ^ = 1 . . . ,iV, (31) 

i=i 

^ 0, z = 1 . . . , AT, 
0<Pij < 1, ^ = l...,iV, J = 1,...,M, 

AT 

Epij < 1, i = i,...,M, 

i=l 
M 

Y^Pij <l, i = l,...,N. 
i=i 

Consider now the following program, where the right-hand side of (31) has been replaced by 
0: 

N r A/ "I 

Z* = min El Tr(^^ Q^'^ + E ^^^P^^^ ' (32) 
Qi.Pij j=i [ j=i J 

S.t. Q,Ai + AjQ, + Q,W,Q, - E^^i^S^^i'^*^' ^0,i = l...,N, (33) 

i=i 

0<P^J<l, t = l...,N, j = l,...,M, 

AT 

1=1 
J\/ 

J2p^j <l, i = l,...,N. 

J=l 

Defining (5 := 1/T, and rewriting with a slight abuse of notation Z*{S) instead of Z*{T) 
for S positive, we also define Z*{0) = Z*, where Z* is given by (32). Note that Z*{0) is 
finite, since we can find a feasible solution as follows. For each i, we choose a set of indices 
Ji = {ji; • • • ;in,} C {1, . . . , M} such that (Aj, Q) is observable, as in assumption 2. Once a 
set Jj has been chosen for each i, we form the matrix P with elements Pij = Ijjgj-}. Finally, 
we form a matrix P with elements satisfying the constraints and nonzero exactly where 
the pij are nonzero. Such a matrix is easy to find if we consider the inequality constraints (1) 
and (3). If equality constraints are involved instead, such a matrix P exists as a consequence 
of Birkhoff theorem [28], see theorem 8. Now we consider the quadratic inequality (33) for 
some value of i. From the detectability assumption 2 and the choice of ptj, we deduce that 
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the pair {Ai,Ci), with 



a 



TT/-1/2 



'1/2 



(34) 



is detectable. Also note that 



M 



i=i 

Together with the controllability assumption 3, we then know that (33) has a positive definite 
solution Qi [29, theorem 2.4.25]. Hence Z*(0) is finite. 

We can also define Z*{6) for 5 < 0, by changing the right-hand side of (31) into SQi^Q = 
— \S\Qifl. We have that Z*{6) is finite for 5 < small enough. Indeed, passing the term 
SQifi on the left hand side, this can then be seen as a perturbation of the matrix Ci above, 
and for 6 small enough, detectability, which is an open condition, is preserved. Now we will 
see below that (30), (32) are convex programs. It is then a standard result of perturbation 
analysis (see e.g. [27, p. 250]) that Z*{6) is a convex function of 6, hence continuous on the 
interior of its domain, in particular continuous at 5 = 0. So 

limsupZ*(T) = lim Z*{T) = Z*. 



Finally, for any policy tt, we obtain the following lower bound on the achievable cost 



lim sup 7^ J2{ '^(^^ ^'(^)) + E 

T^oo I Jo [ ^ 



f^ij'^iji't) f dt > lim Z*{T) = Z 



We now show how to compute Z* by solving a convex program involving linear matrix 
inequalities. For each i, introduce a new (slack) matrix variable Since Qi y 0, Ri >- 
is equivalent, by taking the Schur complement, to 



Ri I 

I Q^ 



^0, 



and the Riccati inequality (33) can be rewritten 



1/2 



^ 0. 
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We finally obtain, dropping the tildes from the notation Qi, the semidefinite program 



N 



M 



Z* = min I Tr{Ti Ri) + Ki^pij 



(35) 



i=l 



s.t. 



Ri I 

I Q^ 



yO, i = l...,N, 



^0, i = 1 . . . , iV, 



< < 1, 2 = 1 . . . , AT, J = 1 



,...,M, 



N 



Y,Pi3 < 1, J = 1,...,M, 



(36) 



1=1 

M 



^Pij < 1, ^ = l,...,Ar. 
i=i 

Hence solving the program (35) provides a lower bound on the achievable cost for the original 
optimal control problem. 



4.2 Problem Decomposition 

It is well-know that efficient methods exist to solve (35) in polynomial time, which implies a 
computation time polynomial in the number of variables of the original problem. Still, as the 
number of targets increases, the large LMI (35) becomes difficult to solve. Note however that 
it can be decomposed into small coupled LMIs, following the standard dual decomposition 
approach already used for the restless bandit problem. This decomposition is very useful to 
solve large scale programs with a large number of systems. For completeness, we present the 
argument in more details below. 

We first note that (36) is the only constraint which links the N subproblems together. So 
we form the Lagrangian 

N f M \ M 

L{R, Q,p;\) = J2\ Tr(r, R,) + J^i^^J + \ ' Yl 

i=i I j=i J j=i 

where A G M.^ is a vector of Lagrange multipliers. We would take A G M^'^ if we had the 
constraint (2) instead of (1). Now the dual function is 

N M 

G(A) = ^G.(A)-^A„ (37) 
i=i j=i 
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M 



with G,{\)= mill Tr(Ti i?^) + + Aj)p, 

-f?iiQi:{Pij}l<j<M 



(3^ 



s.t. 



^0, 



WI^^Q, -I 
< p.,- < 1, J = 1, . . . , M, 

M 



^ 0, 



The optimization algorithm proceeds then as follows [30, chap. 11]. We choose an initial 
value > and set A; = 1. 



1. For i = 1, . . . , N, compute i?f , , {Pij}i<j<M optimal solution of (38), and the value 

2. The value of the dual function at A'^ is given by (37). A supergradient of G{X'') at A'^ 



is given by 



N 



N 



i=l 



1=1 



3. Compute A^"^^ in order to maximize G(A). We can do this by using a supergradient 
algorithm, or any preferred nonsmooth optimization algorithm. Let k:=k+l and go to 
step 1, or stop if convergence is satisfying. 



Because the initial program (35) is convex, we know that the optimal value of the dual 
optimization problem is equal to the optimal value of the primal. Moreover, the optimal 
variables of the primal are obtained at step 1 of the algorithm above once convergence has 
been reached. 



4.3 Open-loop Periodic Policies Achieving the Performance Bound 
4.3.1 Definition of the Policies 

In this section we describe a sequence of open-loop policies that can approach arbitrarily 
closely the lower bound computed by (35), thus proving that this bound is tight. These poli- 
cies are periodic switching strategies using a schedule obtained from the optimal parameters 
Pij. Assuming no switching times or costs, their performance approaches the bound as the 
length of the switching cycle decreases toward 0. 



19 



Let P = [pij]i<i<N,i<j<M be the matrix of optimal parameters obtained in the solution of 
(35). We assume here that constraints (1) and (3) were enforced, which is the most general 
case for the discussion in this section. Hence P verifies 

0<Pij<l, t = l,...,N, j = l,...,M, 

N M 

^Pij < 1, j = 1, • • • , ^, and ^Pij < 1, « = !,..., N. 

i=i j=i 

A doubly substochastic matrix of dimension n is an n x n matrix A = [0'ij]i<i,j<n which 
satisfies 

< aij < 1, i,j = l,...,n, 

n n 

dij < 1, j = ^, ■ ■ ■ ,n, and a^j < 1, i = 1, . . . , n. 

i=i j=i 

li M = N, P is therefore a doubly substochastic matrix. Else if M < (resp. N < M) 
we can add N — M columns of zeros (resp. M — N rows of zeros) to P to obtain a doubly 
substochastic matrix. In any case, we call the resulting doubly substochastic matrix P = [pij]. 
If rows have been added, this is equivalent to the initial problem with additional "dummy 
systems". If columns are added, these correspond to using "dummy sensors". Dummy 
systems (i.e., for i > N) are not included in the objective function (the corresponding Tj is 
0), and a dummy sensor (i.e., for j > M) is associated formally to the measurement noise 
covariance matrix V^^ = for all i, in effect producing no measurement. In the following we 
assume that P is an x doubly substochastic matrix, but the discussion in the MxM case 
is identical. Doubly substochastic matrices have been intensively studied, and the material 
used in the following can be found in the book of Marshall and Olkin [31]. In particular, we 
have the following corollary of a classical theorem of Birkhoff [28], which says that a doubly 
stochastic matrix is a convex combination of permutation matrices. 

Theorem 8 ([32]). The set of N x N doubly substochastic matrices is the convex hull of the 
set Vo of N X N matrices which have a most one unit in each row and each column, and all 
other entries are zero. 



Hence for the doubly substochastic matrix P, there exists a set of positive numbers 0^ and 
matrices Pk G Vo such that 

K K 

P = ^^(f)kPk, with ^^0fc = l, for some integer i^'. (39) 

fe=l k=l 

One way of computing this decomposition is to first extend P to the 2N x 2N doubly 
stochastic matrix 




P I -Dr 

I-D, P^ 
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where ri, . . . , r^r and Ci, . . . ,cn are the row sums and column sums of P, and Dr = diag(ri, . . . , 
Dc = diag(ci, . . . ,C7v)- Then there is an algorithm that runs in time 0{N'^'^) [31, 33] and 
provides the decomposition 

K 

k=l 

with K = {2N — 1)^ + 1 and where the P^'s are permutation matrices of size 2N x 2N. 
The decomposition (39) is finally obtained by deleting the last rows and columns of Pk 
to obtain the matrices Pk, k = 1, . . . , K. 

Note that any matrix A = [aij]ij G Vq represents a valid sensor/system assignment (for the 
system with additional dummy systems or sensors), where sensor j is measuring system i 
if and only if aij = 1. With the decomposition (39), we now consider a family of periodic 
switching policies parametrized by a positive number e representing a time interval over 
which the switching schedule is executed completely. For a given value of e, the policy is 
defined as follows: 

1. At time t = /e, / G N, associate sensor j to system i as specified by the matrix Pi of the 
representation (39). Run the corresponding continuous-time Kalman filters, keeping 
this sensor/system association for a duration 0ie. 

2. At time t = (/ + 0i)e, switch to the assignment specified by P2. Run the corresponding 
continuous time Kalman filters until t = (/ + 0i + (/>2)e. 

3. Repeat the switching procedure, switching to matrix Pj_|_i at time t = / + 0i + ■ ■ ■ + 0j, 
ioT i = l,...,K -1. 

4. At time t = (/ + 0i + ■ ■ ■ + 4>k)^ = + 1)^; start the switching sequence again at step 
1 with Pi and repeat the steps above. 

It is easy to see that the matrices Pi, i = 1, . . . ,K never specify that a "dummy sensor" 
should execute a measurement or that a "dummy system" should be measured, since from 
the decomposition (39) this would correspond to nonzero entries in the columns or rows 
added to P to form P. 

4.3.2 Performance of the Periodic Switching Policies 

Let us fix e > in the definition of the switching policy, and consider now, for this policy, 
the evolution of the covariance matrix S^(t) for the estimation error on the state of system 
i. The superscript indicates the dependence on the period e of the policy. First we have 

Lemma 9. For alii G {1, . . . , A^}, the estimation error covariance S,- (t) converges as t —>■ 00 
to a periodic function E^(t) of period e. 
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Proof. Fix i G {1, . . . , N}. Let crj(t) G {0, 1, ... , A^} be the function specifying which sensor 
is observing system i at time t under the switching pohcy. By convention crj(t) = means 
that no sensor is scheduled to observe system i, and ai{t) = j means that sensor j measures 
system i. Note from the remark following the description of the switching policies that in fact 
we have G {0, . . . , M}, i.e., the policy never schedules measurements by dummy sensors. 
Similarly, if instead we were considering the situation M > N and P an M x M matrix, then 
we would have ai{t) = for z G {A^ + 1, . . . , M} and all t. Note also that ai{t) is a piecewise 
constant, e-periodic function. The switching times of (Ti{t) are t = (/ + 0i + ■ ■ ■ + 0fc_i)e, for 
A; = 1,. . . ,ir and / G N. 

The covariance matrix ^^(t) obeys the following periodic Riccati differential equation (PRE): 

tiit) = A,mt) + mt)Aj + w.- mt){Q{t)fQmm m 

— 1/2 

where C^it) := V-^_^^-^ Cio-^(^t) is a piecewise constant, e-periodic matrix valued function, and we 
use the convention V^J^ = Cij = when j = 0. We now show that {Ai, (■)) is detectable. 
Let ji, . . . ,jK be the successive values taken by the function ai(t) over the period e. From 
the definition of detectability for linear periodic systems and it modal characterization [34, 
p. 130], we immediately deduce that the pair (Aj, Cl{-)) is not detectable if and only if there 
exists an eigenpair (A, x) for Ai, with Re(A) > 0, x ^ 0, such that 

A,x =Xx, and Q(t)e^^*a; = e^*Q(t)x = 0, Vt G [0, e], 
hence Cij-^x = . . . = Cij^^x = 0. (41) 

Let us denote by pk^ij the (i, JY^ element of the matrix P^ in the decomposition (39). We have 
Pk,ij = l{j=ifc} with the above definition of jk, including the case jk = (no measurement), 
which gives pk,ij = for all j G {1, . . . , N}. Then we can write 



K K / N 



k=\ k=\ \i=l 

= EfE<^^PM.)c^S^.7'a 

N 



M 

J2p^jCj;vr'Q, = cra, (42) 



where the next-to-last equality uses the fact that pij = pij for j < M and pij = for 
j > M + 1, and Ci was defined in (34). Note that we now consider this definition of 
Ci for the optimal parameters Pij provided by the solution of (35). Then (41) and (42) 
imply 11^x112 = 0, so CiX = 0, i.e., {A,Ci) is not detectable. But the parameters Pij being 
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optimal for the program (35), this would imply that this program is not feasible [29, p. 68], 
a contradiction with our discussion following (34). So {Ai,C^{-)) must be detectable, and 
together with our assumption 1, this implies by the result of [35, p. 95] that 

hm (SKt)-^(t)) =0, 
where S^(t) is the strong solution of the PRE, which is e-periodic. 

□ 

Next, denote by Si(t) the solution to the following Riccati differential equation (RDE): 

= + + W,-Ei (^^P^JCJ;V,J'Q^ , S,(0) = S,,o. (43) 

Assumptions 2 and 3, together with our discussion of the implied detectability of the pair 
{A,Ci) (see (42)), guarantee that T,i(t) converges to a positive definite hmit denoted S*. 
Moreover, S* is stabilizing and is the unique positive definite solution to the algebraic Riccati 
equation (ARE): 

AiEi + S,Af + Wi-J:i (^^P^JCf^V^J'Q^ = 0. (44) 

The next lemma says that the periodic function S-(t) oscillates in a neighborhood of S*. 
Lemma 10. For all t eR+, we have T.\{t) - S* = 0(e) as e ^ 0. 



Proof. The function t S^(t) of lemma 9 is the strong periodic solution of the PRE (40). 
It is e-periodic. From Radon's lemma [29, p. 90], which gives a representation of the solution 
to a Riccati differential equation as the ratio of solutions to a linear ODE, we also know that 
is C°° on each interval where crj(t) is constant, where (Ti{t) is the switching signal defined 
in the proof of lemma 9. 

Let S- be the average of t — S^()f:): 




Prom the preceding remarks, it is easy to deduce that for all t, we have ^^(t) — S,- = 0(e). 
Now, averaging the PRE (40) over the interval [0,e], we obtain 

A,t\ + t\A^ + w,-- r^mcmfctimm = -me) - = o, 

e Jo e 
where 0|(t) was defined below display (40). Expanding this equation in powers of e, we get 

A,±t + ttA"" + W,-±l Q j\ct{t)Ycmdt^ t\ + R{t) = 0, 
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where R{e) = 0(e). Let jk := a^{t) for t G [(/ + 0i + . . . + 0fc-i)e, (/ + 0i + . . . + 0fc)e]. We 
can then rewrite, using (42), 



So we obtain 



' -^0 k=l j=l 

A.^l + tlA" + {W, + /2(6)) - ±1 (^$^P.,C;y,TiQ, j = 0. 

Note moreover that for e sufficiently small, is the unique positive definite stabilizing 
solution of this ARE, using the fact that controlability of (Aj, W^^'^) is an open condition. 
Now comparing this ARE to the ARE (44), and since the stabilizing solution of an ARE 
is a real analytic function of the parameters [36], we deduce that — E* = 0(e), and the 
lemma. 

□ 

Theorem 11. Let denote the performance of the periodic switching policy with period 
e. Then Z"" — Z* = 0(e) as e 0, where Z* is the performance bound (35). Hence the 
switching policy approaches the lower hound arbitrarily closely as the period tends to 0. 



Proof. We have 



Z'^ = lim sup 

T^oo T 

where vr^ is the sensor/system assignment of the switching policy. First by using a transfor- 
mation similar to (42) and using the convention Hij = for j G {0} U {M + 1, . . . , A^} (no 
measurement or measurement by a dummy sensor), we have for system i 



^ r-l 1 r{n+l)e r-l 

T I y^^v'^ij(^)^''^= ^ E / y^^^ij'^ij{^)d'l^+ 1 y^Kij'Kij{t)dt 

^ i=l ^ n=0 -^^^ j = l ^ "^L¥Jej = l 

k=i >^LfJ'^j=i 

M 1 

^yPii ~'~ 7^ / J, f^ijT^ij{t)dt, 
j=i JIt\^ j=i 



1 

T 



T 

e 

>- -I fc=i 

M 



e 
T 



(45) 



where the j^'s were defined in the proof of lemma 9. Hence 

„T N M N M 



lim sup - / V V Kij'Kl-{t)dt < V KijPij. 
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Next, from lemma 9 and 10, it follows readily that limsup^_,o^ SJ(t) — S* = 0(e). It is 
well known [37] that under our assumptions S* is the minimal (for the partial order on §" ) 
positive definite solution of the quadratic matrix inequality 

Hence for the pij obtained from the computation of the lower bound (35), these matrices S* 
minimize 

N 

1=1 

s.t. AS. + + W,-^^ (j2PijCljV^'a^ ^ 0, t = l,...,N. (46) 

Changing variable to Qi = S^^, and multiplying the inequalities (46) on the left and right 
by Qi yields 

M 

i=i 

and hence we recover (33). In conclusion, the covariance matrices resulting from the switching 
policies approach within 0(e) as e — >■ the covariance matrices which are obtained from the 
lower bound on the achievable cost. The theorem follows, by upper bounding the supremum 
limit of a sum by the sum of the supremum limits to get Z* < Z'^ < Z* + 0(e). □ 

Remark 12. Since the bound computed in (35) is tight, and since it is easy to see that 
the performance hound of section 3 is at least as good as the hound (35) for the simplified 
problem of that section, we can conclude that the two hounds coincide and that section 3 
gives an alternative way of computing the solution of (35) in the case of identical sensors 
and one- dimensional systems. Using the closed form expression for the dual function (20), 
we only need to optimize over the unique Lagrange multiplier \, independently of the numher 
N of systems, instead of solving the LMI (35), for which the numher of variables grows with 
N. 



4.3.3 Transient Behavior of the Switching Policies 

Before we conclude, we take a look at the transient behavior of the switching policies. We 
show that over a finite time interval, S^(t) remains close to Sj(t), solution of the "averaged" 
RDE (43). Together with the previous result of lemma 9 on the asymptotic behavior, we 
see then that S^(t) and Sj(t) remain close for all t. For a matrix A, we denote by ||A||oo the 
maximal absolute value of the entries of A. 

Lemma 13. For all < To < oo, there exist constants eo > and Mq > such that for all 
< e < eo and for all t G [0,To], we have ||S^(t) — Ilj(t)||oo < Mqe. 
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Proof. As in the proof of lemma 10, by Radon's lemma we know that is C°° on each 
interval where ai{t) is constant. We have then, over the interval t G [le, {I + 0i)e], for / G N: 

EK(/ + 0Oe) = m^) + MAme) + me)Af + W,-^{le)ClV,J^^^^^^ (47) 

where as before we denote jk '■= cnit) for t G [(/ + 0i + . . . + (/ + 0i + . . . + (pk)^]- Now 

over the period t G [(/ + 0i)e, (/ + 0i + 02)e], we have: 

mil + 01 + 02)e) = mil + 0i)e) + MAmil + h)e) + mil + ^Oe)^^ + W, 

- mil + 0i)e)C;^V,-/a,,SK(/ + 0i)e)] + Oie'). 

Using (47), we deduce that 

mil + 01 + 02)e) =SK/e) + e[0i + 02]{A,SK/e) + SK/e)Af + H^,} 

- emi^)i<PiClVr^'Q,, + <P,Cl^Vr^C,,,)mi^) + O(e^). 

By immediate induction, and since 0i + ■ ■ ■ + ipK = 1, we then have 

m{i+m = \Amie) + mmj + w,- mi^) ( E^^c^^v^T^ia.j mi^)]+oie'). 



Hence by (42), verifies the relation 



M 



EK(/+1)6) = EK/e)+6 ^A,mie) + mi^)^! + W,- m^^) [Y.P^^^^jVr^^,^j j+^ie' 

(48) 

But notice now that the approximation (48) is also true by definition for Sj(t) over the 
interval t G [le, (/ + l)e]. Next, consider the following identity for Q,X and X symmetric 
matrices: 

AX + XA^ - XQX - iAX + XA^ - XQX) 
= iA- XQ)iX -X) + iX-X)iA- XQf + (X - X)Q(X - X). 



Letting Q = J^jLiPijCfjV^j ^Cij, A-(/) = Sj(/e) — S^(/e), we obtain from this identity 
Atil + 1) = Alii) + e{iA - mh)Q)Alil) + AK/)(A - S,(/e)Q)^ + A\il)QA\il)} + ©(e^). 
Note that AJ(0) = and Sj(t) is bounded, so by immediate induction we have 

Alii) = J2 ^fc(e), where Rkie) = 0(6^) for all k. 



k=l 



Fix Tn > 0. We have then 



e 



e 



A! 



e 



0(e) 



This means that there exist constants eg, Mq > such that 



e 



e 



< Moe, for all < e < eo- 



It is easy to see from the argument above that a similar approximation is in fact valid for 
all t up to time j"^] e. □ 
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Covariance Trajectories 

9 r 
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Figure 1: Comparison of the variance trajectories under Whittle's index pohcy (sohd blue 
curves), the periodic switching policy (oscillating red curves), and the greedy policy (con- 
verging green curves). The solid black curves correspond to Sj(t), solution of the RDE (43). 
Here a single sensor switches between two scalar systems. The period e was chosen to be 0.05. 
The system parameters are Ai = 0.1, = 2,Ci = Qi = Ri = 1, Hi = for i = 1,2. The 
dashed lines are the steady-state values that could be achieved with two identical sensors, 
each measuring one system. The performance of the Whittle policy is 7.98, which is optimal 
(i.e., matches the bound). The performance of the greedy policy is 9.2. Note that the greedy 
policy makes the variances converge, while Whittle's policy makes the Whittle indices (not 
shown) converge. The switching policy spends 23% of its time measuring system 1 and 77% 
of its time measuring system 2. 

4.3.4 Numerical Simulation 

Figure 1 compares the covariance trajectories for Whittle's index policy, the periodic switch- 
ing policy and the greedy policy (measuring the system with highest mean square error on 
the estimate) for a simple problem with one sensor switching between two scalar systems. 
Significant improvements over the greedy policy can be obtained in general by using the 
periodic switching policies or the Whittle policy. An important computational advantage 
of the Whittle policy for large-scale problems with a limited number of identical sensors is 
that using the closed form solution of the indices provided in section 3.2.3, it requires only 
ordering N numbers (which is the same computational cost as for the greedy policy), whereas 
designing the open-loop switching policy requires computing a solution of the program (35). 
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5 Conclusion 



In this paper, we have considered an attention-control problem in continuous time, which 
consists in scheduling sensor/target assignments and running the corresponding Kalman 
filters. We proved that the bound obtained from a RBP type relaxation is tight, assuming 
we allow the sensors to switch arbitrarily fast between the targets. An open question is to 
characterize the performance of the restless bandit index policy derived in the scalar case. 
It was found experimentally that the performance of this policy seems to match the bound, 
but we do not have a proof of this fact. An advantage of the Whittle index policy over 
the switching policies is that it is in feedback form. Obtaining optimal feedback policies 
for the multidimensional case would also be of interest. For practical applications, the 
main limitation of our model concerns the absence of switching costs and delays. Still, 
the optimal solution obtained in the absence of such costs should provide insight into the 
derivation of heuristics for more complex models. Additionally there are numerous sensor 
scheduling applications, such as for telemetry-data aerospace systems or radar waveform 
selection systems, where the switching costs are not too important. 
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